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The complete genome sequence of Enterococcus faecalis V583, a vancomycin- 
resistant clinical isolate, revealed that more than a quarter of the genome 
consists of probable mobile or foreign DNA. One of the predicted mobile 
elements is a previously unknown vanB vancomycin-resistance conjugative 
transposon. Three plasmids were identified, including two pheromone-sensing 
conjugative plasmids, one encoding a previously undescribed pheromone in- 
hibitor. The apparent propensity for the incorporation of mobile elements 
probably contributed to the rapid acquisition and dissemination of drug resis- 
tance in the enterococci. 



The Gram-positive bacterium Enterococcus 
faecalis is a natural inhabitant of the mam- 
malian gastrointestinal tract and is com- 
monly found in soil, sewage, water, and 
food, frequently through fecal contamina- 
tion (1). E. faecalis can withstand oxidative 
stress, desiccation, and extremes of temper- 
ature and pH, and it has high endogenous 
resistance to salinity, bile acids, detergents, 
and antimicrobials (1). 

E. faecalis is an opportunistic pathogen 
that is a major cause of urinary tract infec- 
tions, bacteremia, and infective endocarditis 
(2). The intrinsic resistance of E. faecalis to 
many antibiotics and its acquisition of resis- 
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tance to other antimicrobial agents, particu- 
larly vancomycin, which is used to treat se- 
rious infections by drug-resistant Gram- 
positive pathogens, has led to the emergence 
of E. faecalis as a nosocomial pathogen that 
is refractory to most therapeutic options (3). 
Recent reports of the long-predicted emer- 
gence of vancomycin-resistant Staphylococ- 
cus aureus clinical isolates from transfer of 
enterococcal genes is a serious health care 
concern (4). Here we report the complete 
genome sequence of E. faecalis strain V583 
(5), the first vancomycin-resistant clinical 
isolate reported in the United States (6). The 
genome sequence provides insight into the 
pathogenesis and biology of E. faecalis, the 
role of mobile elements in genome evolution, 
and the transfer of vancomycin resistance. 

A total of 3337 predicted protein-encod- 
ing open reading frames (ORFs) were identi- 
fied on the chromosome and three plasmids 
of E. faecalis V583 (Table 1; fig. SI) (7). 
Over a quarter of the E. faecalis V583 ge- 
nome consists of mobile and/or exogenously 
acquired DNA, including seven probable in- 
tegrated phage regions, 38 insertion elements 
(IS), multiple conjugative and composite 
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transposons, a putative pathogenicity is- 
land, and integrated plasmid genes. To our 
knowledge, this represents one of the high- 
est proportions of mobile elements ob- 
served in a bacterial genome. The plethora 
of mobile elements probably contributed to 
the accumulation of virulence and drug 
resistance factors by E. faecalis. 

Vancomycin resistance in E. faecalis 
V583 appears to be encoded within a previ- 
ously unknown mobile element (EF2282- 
EF2334) with some similarities to the prob- 
able E. faecalis vanB vancomycin-resistance 
conjugative transposon Tnl549 (8). The van- 
comycin-resistance genes (EF1955-EF1963) 
encode vancomycin resistance via synthesis 
of modified peptidoglycan precursors termi- 
nating in D-lactate (9), and they are essential- 
ly identical to the vanB genes from Tnl549. 
The remainder of the element is very diver- 



gent from Tnl549 with multiple insertions, 
deletions, and rearrangements (Fig. 1); rela- 
tively low sequence similarity between con- 
served genes; and a different recombinational 
system (EF2283). Even though E. faecalis 
V583 is the earliest known vancomycin- 
resistant clinical isolate from the United 
States, the conjugative transposon-like fea- 
tures and atypical trinucleotide of this ele- 
ment indicate it was likely obtained as a 
cassette by lateral gene transfer. It is also 
flanked by Tn976-like genes (Fig. 1), which 
may have played a role in the acquisition of 
this element. 

Highly similar Tn97d-like genes are also 
found in association with a locus (EF1869- 
EF1863) encoding homologs of the Strepto- 
coccus pneumoniae VncRS two-component 
signal transduction system and Vex secretion 
proteins (Fig. 1). The vncRS locus has been 



Table 1. General features of the E. faecalis genome. No., number; rRNA, ribosomal RNA; tRNA, transfer 
RNA. 





Chromosome 


pTEF1 


pTEF2 


pTEF3 


Size (base pairs) 


3218031 


66320 


57660 


17963 


G+C content (%) 


37.5 


34.4 


33.9 


33.3 


Protein-coding genes 










No. similar to known proteins 


1760 


37 


22 


10 


No. with unknown function 


221 


2 


1 


1 


No. of conserved hypotheticals 


495 


18 


10 


3 


No. with no database match 


706 


17 


29 


5 


Total 


3182 


74 


62 


19 


Average ORF size (base pairs) 


889 


743 


816 


721 


Coding (%) 


87.9 


83.0 


87.8 


76.4 


rRNA genes 


12 


0 


0 


0 


tRNA genes 


68 


0 


0 


0 


Structural RNA 


2 


0 


0 


0 



associated with vancomycin tolerance in S. 
pneumoniae via mutation of vncS (10), al- 
though recent evidence has cast doubt on this 
association (11). In E. faecalis V583, the 
vncS gene (EF1866) is disrupted by a non- 
functional ISL3 family insertion sequence. 
The possible role of this element in vanco- 
mycin tolerance in E. faecalis is unclear, but 
it is flanked by copies of 1S256 and may also 
have been laterally acquired. 

Thirty-eight IS elements were identified 
(table SI), with three types predominating: 
ISEfl, IS256, and IS1216. There are two 
clusters of IS elements on the chromosome 
(fig. SI). One is associated with a pathoge- 
nicity island and integrated plasmid genes. 
The second cluster includes several types of 
IS elements that flank a region of atypical 
trinucleotide composition (EF1860-EF1858) 
that may have been acquired by lateral gene 
transfer [Supporting Online Material (SOM) 
Text] and encodes three of the four steps of 
pantothenate biosynthesis. 

A large pathogenicity island has previous- 
ly been identified in E. faecalis V583 (12) 
(EF0479-EF0628), including genes for ag- 
gregation substance, cytolysin, and other pos- 
sible virulence or adaptation genes. Trinucle- 
otide composition analysis indicated that 
most of this island has highly atypical com- 
position, except for a region containing inte- 
grated plasmid genes from a pTEFl-like plas- 
mid (fig. SI). The presence of multiple IS 
elements and the integrated plasmid genes 
hints at a complex evolutionary history for 
this element. It is flanked at one end by an 
integrase gene, possibly responsible for inte- 
gration of this element. 
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Fig. 1. Linear representation of the E. faecalis V583 vanB vancomycin- 
resistance gene region and its relationship with the vancomycin-resistance 
transposon Tn7 549 (8) and the E. faecalis V583 vncRS/vex locus. Genes are 
shown as arrowheads (not to scale) colored by predicted function or 
transposable element: black, vanB vancomycin-resistance genes; magenta, 



Tn7549-like genes; orange, Tn976-like genes; green, transposases; red, vn- 
cRS/vex locus; brown, group II intron; light blue, transposon resolvase. Black 
lines connect best matches (BLAST P-value < 1 X 10~ 5 ), and red lines 
connect best matches with greater than 99% identity. Genes are labeled by 
name or by the appropriate ORF or locus numbers. 
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There are seven regions derived from 
probable integrated phage (fig. SI). These 
putative prophage are most closely related to 
phage from other low-GC Gram-positive bac- 
teria. The integrated phage regions encode 
multiple homologs of Streptococcus mitis 
PblA and PblB, which have been implicated 
in binding human platelets, an interaction 
important in the pathogenesis of infective 
endocarditis (13). A ferrochetalase gene 
(EF1989) encoded within one of the phage 
regions may allow E. faecalis to utilize cop- 
roporphyrinogen III for heme synthesis 
(SOM Text). 

A variety of diverse plasmids have been 
previously described in E. faecalis, particu- 
larly conjugative plasmids that encode a mat- 
ing response to sex pheromone peptides se- 
creted by plasmid-free recipient strains (14). 
Three plasmids are present in E. faecalis 
V583 (Table 1; fig. S2): pTEFl and pTEF2 
are structurally similar to the archetypal pher- 
omone responsive plasmids pADl (15) and 
pCFlO (14), respectively, and pTEF3 be- 
longs to the family of pAMfSl broad host 
range plasmids. 

The sex pheromone inhibitor (iADl) and 
surface aggregation substance (Asal) encod- 
ed by pTEFl are identical to those of pADl, 
and both plasmids share extensive regions of 
sequence similarity (fig. S2). There is a 31-kb 
inversion in pTEFl relative to pADl that 
probably affects regulation of conjugation in 
E. faecalis V583 (SOM Text). The unique 
regions in pTEFl include a Tn</0Oi-like 
transposon encoding aminoglycoside resis- 
tance and another IS-flanked element carry- 
ing erythromycin resistance and multidrug 
resistance genes. 

pTEF2 and the sex pheromone plasmid 
pCFlO share regions of similarity, including 
identical copies of the conjugation genes 
prgA-prgB-prgC, but pTEF2 lacks the pCFlO 
pheromone inhibitor prgQ gene, encoding a 
previously undescribed predicted pheromone 
inhibitor gene (EFB0005.1) in the equivalent 
position. pTEF3 is a nonconjugative plasmid 
but has acquired a pTEF2-like prgZ phero- 
mone receptor, whose gene is adjacent to 
multiple IS elements. The occurrence of a 
novel pheromone inhibitor on pTEF2 sug- 
gests the possible occurrence of a broad di- 
versity of different E. faecalis pheromones 
and pheromone inhibitors in nature. Five sex 
pheromones encoded within lipoprotein sig- 
nal peptides have been identified in the ge- 
nome of E. faecalis V583 (16). An additional 
76 predicted lipoproteins were identified (ta- 
ble S2), some of which could represent pre- 
viously unknown pheromone precursors. 

The chromosome of E. faecalis contains 
at least three integrated plasmid remnants. 
Two of these regions encode homologs of 
aggregation substance (EF0485, EF0149) 
that may be important for virulence (fig. 



S3). Plasmid-encoded aggregation sub- 
stance is a surface protein that enhances 
conjugative transfer and has been implicat- 
ed in adhesion to colonic mucosal fibronec- 
tin and in translocation across intestinal 
epithelium (1 7). One of the integrated plas- 
mid regions (EF0506-EF0485) is located 
within the probable pathogenicity island 
(12); the other two regions include genes of 
plasmid, phage, and conjugative transposon 
origin (fig. S3; SOM Text) and encode 
lipoproteins (EF0164 and EF2512) whose 
signal peptides resemble those of known 
pheromone precursors. The presence of 
multiple integrated plasmid remnants and 
three resident plasmids in E. faecalis V583 
emphasizes the importance of plasmids in 
genome plasticity of the enterococci. 

Comparison of the predicted protein set of 
E. faecalis with those of other sequenced 
genomes confirmed the relationship of E. 
faecalis within the low-GC Gram-positive 
bacteria. Over 85% of E. faecalis ORFs with 
significant database matches had their best 
match to other sequenced low-GC Gram- 
positive organisms, and 519 E. faecalis ORFs 
were conserved in a set of 10 sequenced 
low-GC Gram-positive organisms (table S3) 
(5). The distribution of these conserved genes 
in the E. faecalis genome revealed a number 
of regions with a low abundance of conserved 
genes (fig. SI), largely corresponding to the 
identified phage, integrated plasmid, patho- 
genicity island, and conjugative transposon 
regions. This set of conserved genes is in- 
volved in essential processes such as tran- 
scription, translation, and protein synthesis 
but also includes a considerable number of 
proteins from large paralogous families such 
as the PTS and ABC transporter families. 

There is essentially no large-scale gene 
synteny between the E. faecalis genome 
and that of any sequenced low-GC Gram- 
positive bacterium. The multiplicity of mo- 
bile or foreign elements such as phage and 
IS elements suggests the E. faecalis ge- 
nome is highly malleable and may have 
undergone multiple rearrangement events, 
explaining the lack of gene synteny. De- 
spite this lack of synteny, there is a strong 
transcriptional skew (fig. SI) as found in 
other sequenced low-GC Gram-positive 
bacteria. Genes encoded within the mobile 
elements also show a transcriptional skew; 
within the phage regions, for instance, 90% 
of the ORFs align with the direction of 
replication. Strong selective pressure for 
genes to be transcribed in the direction of 
replication appears to be a common feature 
of the low-GC Gram-positive bacteria. 

Analysis of the transport and metabolic 
capabilities of E. faecalis V583 emphasizes 
the importance of fermentation of nonab- 
sorbed sugars in the gastrointestinal tract in 
its life-style. E. faecalis has 35 probable PTS- 



type sugar transporters, comparable to Liste- 
ria species and considerably more than any 
other sequenced organism, as well as ABC- 
type and other sugar uptake systems (table 

54) . Consistent with these transport capabil- 
ities, E. faecalis encodes pathways for the 
utilization of more than 15 different sugars 
(table S4). Energy production from these sub- 
strates occurs via glycolysis or the pentose 
phosphate pathways, with the trichloroacetic 
acid (TCA) cycle absent. 

E. faecalis is one of the few bacteria that 
are substantial producers of extracellular 0 2 ~ 

(18) , and an array of oxidative stress resis- 
tance mechanisms are evident from its ge- 
nome (table S5). E. faecalis is well endowed 
with cation homeostasis mechanisms (table 

55) , which probably contribute to its pH, salt, 
metal, and desiccation resistance, including 
14 predicted metal ion P-type ATPases, more 
than any other currently sequenced bacteri- 
um. Despite its stress resistance capabilities, 
E. faecalis possesses a modest collection of 
regulatory genes (table S6), including only 
three alternate sigma factors. 

E. faecalis is known to adhere to a variety 
of host cells or extracellular matrix compo- 
nents. Adhesins such as aggregation sub- 
stance, MSCRAMM, hemagglutinin, and 
other virulence factors have been described 
for Enterococcus spp. (/). A comprehensive 
genome-wide exploration for signal sequenc- 
es, lipoprotein motifs, and other potential 
host cell components or binding motifs yield- 
ed an additional 134 putative surface-exposed 
proteins that may be associated with early 
colonization stages or virulence (table S7). 

Forty-seven candidates with potential 
choline- or integrin-binding motifs were 
identified that may play a role in adherence 
and internalization processes (table S7). A 
family of four probable adhesin lipoproteins 
was identified, including a characterized en- 
docarditis-specific antigen (EF2076). A 
paralogous family of 21 LPxTG-motif cell 
wall surface anchor proteins is present, as are 
three corresponding sortases (EF3056, 
EF1094, and EF2524). Several of the genes 
for LPxTG proteins have an atypical nucleo- 
tide composition, suggesting a foreign origin. 
Translocation across the intestinal epithelium 

(19) may be facilitated by homologs of L. 
monocytogenes internalin A (EF2686), an S. 
aureus exfoliative toxin (EF0645), and prob- 
able secreted hyaluronidases (EF3023 and 
EF0818) and a protease (EF1818). 

Antigenic or phase variation of surface 
structures is a common immune evasion tac- 
tic amongst pathogens and has been implicat- 
ed in activating conjugation function in E. 
faecalis (20). Out of 134 putative surface- 
exposed proteins, 65 were found to contain 
stretches of homopolymeric sequence or iter- 
ative nucleotide motifs located within the 
predicted ORF or promoter region, which 
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may enable phase variation via a slippage 
type mechanism (table S7). 

An unprecedented amount of the E. fae- 
calis V583 genome consists of intact or par- 
tial mobile elements. Many of these regions 
have complex mosaic structures comprised of 
different elements, suggesting they are "hot- 
spots" or "graveyards" for mobile element 
insertion. This apparent propensity for the 
incorporation of mobile elements probably 
contributed to the rapid acquisition and dis- 
semination of drug resistance in the entero- 
cocci and suggests that they act as a reservoir 
for the further dissemination of drug resis- 
tance traits such as vancomycin resistance 
via mobile elements and/or conjugative 
plasmids. The complete genome sequence 
of E. faecalis V583 has enabled the identi- 
fication of numerous predicted virulence 
factors and surface-exposed proteins that 
may facilitate the development of therapeu- 
tic approaches to combat this important 
nosocomial pathogen. 
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A Genomic View of the 
H u ma n-Bacteroides 
thetaiotaomicron Symbiosis 

Jian Xu, Magnus K. Bjursell, Jason Himrod, Su Deng, Lynn K. 
Carmichael, Herbert C. Chiang, Lora V. Hooper, 
Jeffrey I. Gordon* 

The human gut is colonized with a vast community of indigenous microor- 
ganisms that help shape our biology. Here, we present the complete genome 
sequence of the Gram-negative anaerobe Bacteroides thetaiotaomicron, a dom- 
inant member of our normal distal intestinal microbiota. Its 4779-member 
proteome includes an elaborate apparatus for acquiring and hydrolyzing oth- 
erwise indigestible dietary polysaccharides and an associated environment- 
sensing system consisting of a large repertoire of extracytoplasmic function 
sigma factors and one- and two-component signal transduction systems. These 
and other expanded paralogous groups shed light on the molecular mechanisms 
underlying symbiotic host-bacterial relationships in our intestine. 



A major theme of life on our planet is the 
complex and beneficial interactions that oc- 
cur between eukaryotes and prokaryotes. 
Humans are no exception. As adults, we 
harbor diverse communities of microorgan- 
isms whose total number exceeds the sum 
of all of our somatic and germ cells (1). As 
yet, the ways in which these communities 
contribute to normal postnatal development 
and adult physiology are largely unex- 
plored. The human gut contains the largest 
such collection of microbes [10" organ- 



Department of Molecular Biology and Pharmacology, 
Washington University School of Medicine, St. Louis, 
MO 63110, USA. 

To whom correspondence should be addressed. E- 
mail: jgordon@molecool.wustl.edu 



isms per ml proximal colonic contents (1)]. 
An estimated 2 to 4 million genes are em- 
bedded in the aggregate genome (microbi- 
ome) of an intestinal community of —500 
to 1000 bacterial species (2). The products 
of these genes provide metabolic capacities 
not encoded in our own genome (3). 

The gut microbiota is a key regulator of 
the human immune system; it acts to induce 
tolerance to microbial epitopes and thus to 
reduce responses to commonly encountered 
foodstuffs and other environmental anti- 
gens (4). Functional genomic studies of 
germfree mice colonized with components 
of the human intestinal microbiota are re- 
vealing other functions affected by indige- 
nous bacteria, including fortification of the 
mucosal barrier and angiogenesis (5—7). 



These observations emphasize the need to 
understand more about the roles played by 
the microbiota in host biology, as well as 
the potential for control and modulation. 

Here, we describe the complete 6.26-Mb 
genome sequence of the Gram-negative anaer- 
obe, Bacteroides thetaiotaomicron (figs. SI to 
S5 in supporting online material). This genet- 
ically manipulatable organism is a predomi- 
nant member of the normal human (and mu- 
rine) distal small intestinal and colonic mi- 
crobiota (8) and has been used as a model for 
understanding the impact of constituents of 
the microbiota on gut gene expression (5, 9). 
The genome sequences of members of the 
Bacteroidetes phylum, which diverged early 
in the evolution of Bacteria (10), have not yet 
been reported. 

The B. thetaiotaomicron type strain, VPI- 
5482 (ATCC 29148), was originally isolated 
from the feces of a healthy adult human. Of 
the 4779 predicted proteins in its proteome, 
2782 (58%) were assigned putative functions 
on the basis of homology to other known 
proteins. Of the predicted proteins, 848 
(18%) have homology to proteins with no 
known function, whereas 1 149 (24%) have 
no appreciable homology to entries in public 
databases. The most markedly expanded 
paralogous groups are involved in polysac- 
charide uptake and degradation (glycosylhy- 
drolases, cell-surface carbohydrate-binding 
proteins); capsular polysaccharide biosynthe- 
sis (e.g., glycosyltransferases); environmen- 
tal sensing and signal transduction [one- and 
two-component systems; extracytoplasmic 
function (ECF)-type sigma factors]; and 
DNA mobilization (transposases, conjugative 
transposons) (table SI). These expansions re- 
veal strategies used by B. thetaiotaomicron to 
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